Correlating Cost with Performance in LLVM
نویسندگان
چکیده
A common technique to exploit data level parallelism is code vectorization. When performed by a compiler, it needs to find a valid vectorization and assess its benefit. In LLVM, this analysis is based on a cost calculation, which will approve the transformation if the cost of the vectorized code is lower than the original scalar code. However, the calculated cost does not correlate to the actual speedup gain. We therefore propose a pluggable cost model that correctly accounts for vectorization overheads and further features of the target hardware platform. Using such a platform specific model, the compiler can assess a code transformation’s impact on application performance, make safe choices whether to transform or not, and compare different optimization options.
منابع مشابه
Lowering C11 Atomics for ARM in LLVM
This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence of memory barriers on performance, and I show that in some cases LLVM generates too many barriers. By leaving these barriers out, performance increases significantly. I introduce two LLVM passes, which will remove these extra barriers, improving performance in...
متن کاملGeneralizing loop - invariant code motion in a real - world compiler
Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific com...
متن کاملBoosting Instruction Set Simulator Performance with Parallel Block Optimisation and Replacement
Time-to-market is a critical factor in the commercial success of new consumer devices. To minimise delays, system developers and third party software vendors must be able to test their applications before the hardware platform becomes available. Instruction Set Simulators (ISS’s) underpin this early development by emulating new platforms on ordinary desktop machines. As target platforms become ...
متن کاملDominant block guided optimal cache size estimation to maximize IPC of embedded software
Embedded system software is highly constrained from performance, memory footprint, energy consumption and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC). Instruction cache has major contribution in improving IPC. Cache memories are realized on the same chip where the processor is running. This considerably increases the system cost as well. He...
متن کاملFast Instruction Set Simulation Using LLVM-based Dynamic Translation
In the development of embedded systems, Instruction-Set Simulators (ISS) plays an important role. When using an ISS, simulation speed is a significant issue. In this paper, we present a dynamic translation technique that uses the LLVM open-source compiler infrastructure to increase the simulation speed. Our dynamic translation technique translates hot basic blocks of the target instruction set ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017